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ABSTRACT 

We describe algorithms that detect 21cm line H I self-absorption (HISA) in 
large data sets and extract it for analysis. Our search method identihes HISA as 
spatially and spectrally conhned dark H I features that appear as negative resid¬ 
uals after removing larger-scale emission components with a modihed CLEAN 
algorithm. Adjacent HISA volume-pixels (voxels) are grouped into features in 
(£, b, v) space, and the H I brightness of voxels outside the 3-D feature bound¬ 
aries is smoothly interpolated to estimate the absorption amplitude and the un¬ 
absorbed H I emission brightness. The reliability and completeness of our HISA 
detection scheme have been tested extensively with model data. We detect most 
features over a wide range of sizes, linewidths, amplitudes, and background levels, 
with poor detection only where the absorption brightness temperature amplitude 
is weak, the absorption scale approaches that of the correlated noise, or the back¬ 
ground level is too faint for HISA to be distinguished reliably from emission gaps. 

False detection rates are very low in all parts of the parameter space except at 
sizes and amplitudes approaching those of noise fluctuations. Absorption mea¬ 
surement biases introduced by the method are generally small and appear to 
arise from cases of incomplete HISA detection. This paper is the third in a se¬ 
ries examining HISA at high angular resolution. A companion paper (Paper H) 
uses our HISA search and extraction method to investigate the cold atomic gas 
distribution in the Canadian Galactic Plane Survey. 
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1. Introduction 

The 21cm line of neutral atomic hydrogen (H I) is a key probe of the Galactic interstellar 
medium. Although the cold (T < 100 K) gas distribution is difficult to map in H I emission, 
H I self-absorption (HISA) allows cold foreground gas to be distinguished from warmer 
background gas at the same radial velocity (Gibson 2002). Until recently, HISA has been 
studied in limited low-resolution maps (e.g.. Baker & Burton 1979; Bania & Lockman 1984), 
or in a few isolated objects at higher resolution (e.g., van der Werf, Goss, & Vanden Bout 
1988; Feldt 1993), but no detailed, systematic surveys have been made. High resolution 
allows feature structure to be studied and unabsorbed background brightness to be estimated 
accurately. Goverage of a wide area enables an unbiased look at the HISA population, e.g., 
without the a priori expectation that HISA is found only in molecular clouds. 

High-resolution, wide-area HISA surveys have now become possible with the advent of 
several major H I synthesis surveys: the Ganadian Galactic Plane Survey (GGPS; Taylor et 
ah 2003), the Southern Galactic Plane Survey (SGPS; McGlure-Griffiths et al. 2001), and the 
VLA Galactic Plane Survey (VGPS; Taylor et al. 2002). Past HISA studies have identihed 
absorption features by eye, but this approach is no longer adequate. The very richness of the 
synthesis survey data sets requires that they be analyzed in a rigorous, repeatable manner. 
We have therefore designed automated algorithms to identify and extract HISA features 
from H I longitude-latitude-velocity (£, b, v) data cubes. 

In this paper, we describe our HISA search and extraction algorithms. We also explain 
how we have tested our software with model data to determine its reliability under a range 
of different conditions. Large surveys are playing an increasingly signihcant role in mod¬ 
ern astrophysics, and it is essential that their underlying methods are understood so their 
results can be interpreted properly. Following criteria established in Gibson et al. (2000; 
hereafter Paper I), our HISA search software seeks hnely-structured dark features against 
bright backgrounds that cannot be confused with simple gaps in H I emission. Although its 
parameters are optimized to identify HISA in the GGPS, the software is easily adapted to 
work with other surveys (e.g., the VGPS: Gibson et al. 2004). 

The GGPS uses a hexagonal grid of full-synthesis helds with single-dish observations 
to enable the detection of all scales of H I structure down to the synthesized beam. The 
GGPS H I data have a 58" x 58"cosec((5) beam, 0.824 kms“^ velocity sampling, and a held- 
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center noise of Trms ~ 3 K in empty channels; T^ms doubles when the 107' primary beam 
is hlled with 100 K emission, and it can be up to 60% greater between held centers. The 
initial phase of the survey mapped a 73° x 9° region along the Galactic plane with longitudes 
74.2° < i < 147.3° and latitudes —3.6° < b < +5.6° (+33.9° < S < +68.4°), and extensions 
in both i and b have followed. 

Below, we describe our method of HISA identihcation and extraction at some length 
(§2) and evaluate the method’s performance with models (§3). A companion paper presents 
the results of our HISA search of the 73° x 9° Phase 1 CGPS (Gibson et ah 2005; hereafter 
Paper 11). Subsequent papers in this series will apply the HISA search to other data sets. 


2. Feature Extraction 
2.1. Identification Strategy 

2.1.1. Criteria 

Many HISA features are apparent to the eye (e.g., see Figs. 4 - 8), but a complete visual 
search is unlikely to be uniform, repeatable, or thorough, and it is also impractical given the 
sheer volume and complexity of the GGPS data. Thus, an automated search is needed. The 
search algorithm should hnd features meeting simple criteria that can be conhrmed by eye, 
but it should also be tested with model data to evaluate its performance quantitatively (§3). 

The nature and appearance of HISA dictate how it can be identihed. First, while the 
cold H I from which it arises can have any extent, no HISA feature can exceed the {i,b,v) 
boundaries of its bright background H I emission, or it ceases to be absorption. Second, HISA 
must have different {i, b, v) structure than the background H I for it to be distinguishable 
from background fluctuations. We choose to search for HISA that is more hnely-structured 
than the background H I, since this is consistent with the hrst constraint, most GGPS HISA 
that can be visually identihed is of this nature, and the exceptions (e.g.. Knee & Brunt 2001; 
Kerton 2005) are difficult to identify algorithmically. 

We seek H I features that can only be explained as HISA. We prefer this conservative 
approach over the alternative of including signihcant false detections in our survey sample. 
As given in Paper 1, our conditions for distinguishing HISA from simple gaps in H I emission 
are: (1) narrower line widths than most observed emission features; (2) steeper line wings; 
(3) more small-scale angular structure; and (4) a minimum H I emission background level. 
The hrst two conditions are related for Gaussian line prohles, since these have line wing 
slopes proportional to amplitude over width, but real HISA need not be Gaussian. The 
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last condition excludes the finely-structured H I emission gaps that are common at interarm 
velocities in the outer Galaxy, where smooth, bright H I backgrounds are often absent. 
These four criteria exclude HISA on larger angular and velocity scales or against weaker 
backgrounds, but they are adequate for capturing most visible features. We do not require 
the extra condition of molecular line emission to confirm HISA features (e.g., Knapp 1974), 
since many HISA features are visible without emission in the CGPS (Paper I; Gibson 
2002) and in other surveys (e.g., Peters & Bash 1987). 


2.1.2. Algorithms 

We tried and rejected many different methods before selecting the algorithm described in 
this paper. Discarded techniques include various derivative measures to detect sharp edges, 
spatial and spectral curvature tests to look for dips, Fourier and wavelet filtering methods, 
and unsharp masking. Most of these were successful in locating the strongest features, but 
few were robust against noise, and many also produced large numbers of artifacts and false 
detections. The latter were especially frequent in methods that used only spectral or spatial 
searches rather than both combined. 

Our chosen method is based on a variant of the GLEAN algorithm (Hogbom 1974) 
developed by Steer, Dewdney, & Ito (1984) (hereafter the SDI GLEAN). We remove large- 
scale spectral and spatial emission structures from the H I data iteratively and flag the 
small-scale negative residuals as self-absorption features. For computational efficiency, these 
operations are carried out separately on the spectrum at each spatial position (£, b) and on 
the channel map at each radial velocity {v) in the data cube, and the results are combined 
afterward. The identified HISA is then subtracted from the H I cube, and the whole process 
is repeated until significant HISA can no longer be found; such iteration allows features 
larger than the chosen GLEAN scales to be mapped. 

The two spectral and spatial search algorithms are described below. Each has been tuned 
to find as much visually identifiable HISA as possible while minimizing false detections. The 
latter are further reduced by subsequently requiring the HISA at any 3-D position (£, b, v) to 
be detected by both searches (§2.4). The two algorithms were tuned by visually comparing 
the search output against the observed H I for many different parameter value combinations, 
using a range of different HISA features with different H I emission backgrounds in the GGPS 
data. The parameters that yielded the most complete HISA detections with the fewest false 
detections were used in the model-based search performance evaluations (§3) and in the 
GGPS HISA survey (Paper H). 
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In the following discussions, the Galactic coordinate variables (£, b, v) are replaced by 
their pixel coordinate analogs Adopted values for the search parameters are given 

in square brackets [ ]. These give the best performance for a HISA search of CGPS data, 
but they may not be universal. In particular, the best hlter scales and minimum background 
level may differ for HISA searches elsewhere in the Galaxy. 


2.2. Spectral Search 

At each spatial position the spectral search algorithm builds an approximation 

of the “unabsorbed spectrum” U{k) that would be observed if no HISA were present. The 
algorithm assumes that U{k) can be constructed from Gaussian functions of a characteristic 
width Wchar that is narrower than the dominant emission features but broader than the 
width of any expected HISA feature. Any channels in which the observed spectrum 0{k) 
deviates signihcantly negatively from U{k) are flagged as possible HISA. 

The iterative procedure used to derive U{k) is a modihcation of the SDI GLEAN. U{k) 
is initially set to zero, and the “residual spectrum” R{k) is set equal to S{k), a smoothed 
version of 0{k). Smoothed data are used to improve the signal-to-noise for the GLEANing 
process. S{k) is a spatial average of A^ x A^ pixels centered at (i,j), i.e., the average of 
k) intensities where — i\ < {N — l)/2 and — j\ < {N — l)/2 [and N = 7 pixels 
= 2.1']. Independent of this spatial averaging, the spectral rms noise aobs in 0{k) is computed 
as the lowest of three rms noise measures over equal thirds of 0{k). In the GLEAN loop, 
the following steps are performed: 

1. If Rmax) the peak value of i?(/c), is less than a preset fraction [3%] of the peak value of 
S{k), the iteration ceases. 

2. For any channel k where R{k) exceeds a given clip level [0.8] xR^ax, a “correction 
spectrum” C{k) is set to a preset gain [0.25] xR{k)-, elsewhere, C{k) is set to zero. 

3. C{k) is convolved with a Gaussian whose full width at half maximum (FWHM) is 
Wchar [8 kms“^], and the resulting spectrum is added to U{k). 

4. The new residual spectrum is set to R{k) = S{k) — U{k), and <Jpos, the rms of all 
positive values of R{k) is computed over all channels. As U{k) approaches S{k), apos 
decreases; if apos < o'obs, the iteration is terminated. 

5. If the iteration has not terminated due to one of the above convergence criteria, and a 
maximum number of loops [1000] has not been reached, steps 1-5 are repeated. 
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After the CLEAN loop is completed, adjacent channels where R{k) < apos x a factor 
F [—2.0] are grouped into “segments” of suspected HISA. In each segment, S{k) and U{k) 
are evaluated at the channel kmin where R{k) has a local minimum. If S{kmin) < ‘^Fapos, 
the segment is rejected as a likely noise feature. If U{kmin) < Tcrit, where Tcru is a preset 
brightness level [30 K], the segment is rejected as having an insufficiently bright background 
to identify HISA clearly. A second, more conservative Tcrit [70 K] is applied later when the 
spatial and spectral search results are combined (§2.4). 

A Gaussian prohle is htted to the absorption magnitude spectrum R{k) of each remain¬ 
ing channel segment. Real HISA line prohles may not be Gaussian, but this shape is assumed 
for simplicity. For computational speed, the central channel of the Gaussian is hxed at kmin-, 
and the FWHM is hxed at one of two values, Wnarrow [2-0 kms“^] or W^road [4-0 kms“^], 
to capture HISA with a range of widths. Model tests (§3) show that many features outside 
this range are also detected. In each £t, a sloping linear base level is derived along with the 
Gaussian amplitude A and the standard deviation af^ of the £t. A £t is rejected as statis¬ 
tically unreliable if A/afu < D [2.0] or A/apos < D. It can also be rejected if 0{k) lacks a 
morphological “dip” at kmin- This is determined with a hlter function that returns a value 
of 1.0 for a dip between two equal peaks, 0.5 for a “dip” that drops only to the level of the 
adjacent spectral data on one side of kmim and 0.0 on a linearly rising or falling spectrum. 
Fits that return a value below the chosen threshhold [0.6] are rejected. This filter inhibits 
the detection of HISA on the edges of emission features that are steeper than Wchar would 
allow; otherwise, significant false HISA detections result. For accepted fits, the channels in 
a narrow or broad HISA spectrum (both initially zero) are set equal to the fitted Gaussian 
if the amplitude exceeds a given fraction [5%] of A. Then, a “detected” HISA spectrum 
is created that consists of the maximum value in each channel from the narrow and broad 
Gaussian hts to the HISA line profile, to ensure full detection of the feature. In the case 
where the broad line wings do not correspond to real HISA in a narrow feature, these will 
not be detected by the spatial search and will be removed at a later stage of analysis. 

Finally, when the HISA amplitudes have all been computed, these are spatially smoothed 
[with a 1.5' beam] to join together groups of hagged “hecks” into more coherent features, 
and those that are sufficiently weak and isolated are culled if their amplitude falls below a 
specified threshhold [2 K]. An additional cosmetic improvement is made by excluding strong 
HIGA from the set of spectrally-identihed HISA. This is done by dropping any sight lines from 
the search that contain channels whose continuum-subtracted line brightness is significantly 
negative, i.e., if 0{k) < —Qaobs- Weaker HIGA will survive this hlter to contaminate the set 
of detected HISA. Such contamination is difficult to remove in a way that leaves the “pure” 
HISA in the same sight lines intact. 
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To illustrate the algorithm, we plot the H I spectrum of Paper I’s Perseus HISA Globule 
{i = 139.635°, b = 1.185°) in Figure 1 and several stages of the algorithm’s analysis in 
Figure 2. These H I data supersede those used in Paper I, which contained a flaw that had 
no serious impact on the results. Appendix A gives further details. 

In Figure 2, the estimation of U{k) converged when the peak residual became less than 
3% of the initial spectral peak, giving Upos = 3.6 K (the dashed line indicates a negative 
deviation of twice this value). Suspected HISA was identihed in eight channel segments. 
One of these (4) had no counterpart dip in 0{k). Six others were rejected because U{k) < 
70 K. For simplicity, we used Tcru = 70 K here to show what would survive the ultimate 
Tcrit = 70 K hlter. In the remaining segment (5), channels in which the “merged” HISA 
spectrum (maximum of the narrow and broad HISA spectra) is non-zero were then flagged 
as having “detected” HISA. Note that, because of the initial smoothing, the detected HISA 
has a smaller amplitude than in Figure 1. The full HISA amplitude is recovered when the 
spectral and spatial search results are merged (§2.4). 


2.3. Spatial Search 

The spatial search algorithm is similar in principle to the spectral search, although it 
does not attempt to £t absorption features with Gaussian shapes, nor does it require that 
they satisfy a morphological “dip” filter. It begins by estimating the unabsorbed brightness 
distribution U{i,j) in a given spectral channel k. The algorithm assumes that U{i,j) can 
be constructed from two-dimensional circular Gaussian components of a characteristic width 
Gchar that is narrow enough to represent most H I emission structure but broader than any 
expected HISA features. Glearly the choice of Gchar limits the angular size of HISA features 
that will be detected, although this can be alleviated with repeated searches. 

The iterative procedure used to derive U{i, j) is again a modification of the SDI GLEAN. 
U{i,j) is initially set to zero, and the residual map R{i,j) is set equal to S{i,j), a spatially 
smoothed copy of the observed channel map 0{i,j)- Use of S{i,j), computed as an A^ x A^ 
pixel average of 0{i,j) [with A^ = 15 pixels = 4.5'], improves the GLEAN convergence. On 
a larger angular scale [20'], an estimate of the typical rms noise (Jobs in 0{i,j) and its gross 
variation across the channel map are derived in a manner similar to the spectral rms noise 
in §2.2. From this, the average rms noise in termed <Jsm, and its variation over the 

map are deduced. In the GLEAN loop, the following steps are performed: 

1- Rmax, the Mth [10th] highest value of is found. This is chosen rather than the 

peak value so that the iteration process is not dominated by one noisy pixel. If Rmax 




is less than a preset fraction [3%] of the peak value of S{i,j), the iteration ceases. 

2. For any pixel {i,j) where R{i,j) exceeds a given clip level [0.5] xRmaxi a correction 
map C{i,j) is set to a preset gain [0.25] xR{i,j)] elsewhere, C{i,j) is set to zero. 

3. C{i,j) is convolved with a 2-D Gaussian whose FWHM is Gchar [20'], and the resulting 
image is added to U{i,j)- 

4. The new residual map is set to R{i,j) = >S'(i, j) — U{i,j), and the rms value apos of the 
positive values of R{i,j) is computed. In deriving apos, allowance is made for the fact 
that the noise may vary across the image by applying suitable weights to the values of 
R{i,j). If apos < o'sm, the iteration is terminated. 

5. If the iteration has not terminated due to one of the above convergence criteria, and a 
preset maximum number of loops [1000] has not been reached, steps 1-5 are repeated. 

After the CLEAN loop is completed, all pixels where R{i,j) < apos{i,j) x a factor 
F [—2.0] are noted, as are those where 0{i,j) — U{i,j) < Faobs{i,j)- A map of “suspected 
HISA” is set equal to U{i,j) — 0{i,j) for all pixels (f,j) where either condition is met 
and zero elsewhere. This map is then filtered to remove pixels with amplitudes less than 
a specihed cutoff [4 K], as well as those for which U{i,j) < Tcrit [30 K]. Lastly, as in the 
spectral algorithm, the suspected HISA map is smoothed [with a 1.5' beam] to improve 
feature coherence, and a hnal cull is made of smoothed amplitudes below a lower threshhold 
[2 K]; surviving features are deemed “detected HISA”. 

The spatial search is illustrated in Figure 3, with longitude prohles taken through the 
Perseus HISA Globule position (6 = 1.185°, = —41.04 kms“^) from one channel map 

at different stages of processing. The determination of U{i,j) took 114 iterations, ending 
when apos became less than asm = 2.16 K. 


2.4. HISA Amplitude Estimation 

2.4-1- General Approach 

The physical properties of the absorbing gas cannot be understood without knowing the 
HISA brightness temperature amplitude AT = — T^, where = 0{i,j,k) is 

the observed brightness on the HISA feature, and = U{i,j,k) is the unabsorbed 

emission that would be measured if no HISA were present. Since only is directly observed 
at the HISA position, must be estimated from T^^^, the H I brightness off the HISA 
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feature in space and/or velocity. For clarity, we note that represents all of the unabsorbed 
emission along the line of sight at radial velocity v. The emission from behind the HIS A 
feature that is subject to absorption is p -T^, where 0 < p < 1; the exact value of p depends 
upon the sight-line geometry (see Paper I). 

Several means of estimating have been used in past studies. In the spectral domain, 
the velocity edges of a HISA feature can be fitted with straight lines (Hasegawa et al. 1983; 
Montgomery et al. 1995) or more complex functions (Knapp 1974; McCutcheon et al. 1978; Li 
& Goldsmith 2003; Kavars et al. 2003) to estimate at intervening velocities. In the spatial 
domain, the H I brightness at positions adjacent to the HISA feature can be used directly as 
Tjj (Paper I; Minter et al. 2001) or as anchor points for spatial hts across the feature (Feldt 
1993; Kavars et al. 2003). A variant on this approach assumes HISA is sufficiently diluted 
in the broad beam of a single dish telescope to use the single dish spectrum at the feature 
position as (van der Werf et al. 1988). 

Our approach is more general. We group the HISA volume-pixels (voxels) identihed in 
§§2.2-2.3 into contiguous 3-D features in the spectral line cube. For each feature, we estimate 
Ty by interpolating the values of the non-HISA voxels that border the feature in (£, &, v) 
space. The interpolation uses a 3-D Gaussian weighting function to ensure smoothness on 
the scale of the feature. Specihcally, at each position (£, b, v) within the HISA feature. 


^ ^ rCn ■ ^OFF(^n) W) 

TA<^,b,v) = ^^- , ( 1 ) 

n=l 


where n indexes the list of off-HISA voxels with coordinates (£/, 6/, u/), and the weight 
Wn is given by 


w 


n 
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The Gaussian dispersions {ag, af,, a^) are set so that each FWHM (= a ■ v81n2) is half the 
maximum length of any contiguous row of HISA voxels in that dimension, with FWHM 
lower limits of 1.2' and 3.3kms“^ and upper limits of 20' and 8 kms“^, the HISA search 
GLEAN scales. The result is somewhat similar to that of 1-D spectral htting methods, but 
its structure is constrained by all three dimensions of the H I data. This method yields 
and AT estimates superior to those of our separate spectral and spatial searches. 



2.4-2. Filtering 


The Ty estimation algorithm considers two conhdence levels of HIS A. First, a union 
hlter requiring a HISA identihcation from either the spectral or spatial search is applied. 
This hlter includes nearly every HISA feature that the eye can detect, as well as many non- 
HISA features that are discarded later. All accepted voxels are grouped into tentative HISA 
features and interpolated over to obtain which is subtracted from the unsmoothed H I 
data to get AT with the full CGPS angular resolution. Then, an intersection hlter requiring 
HISA identihcation in both spectral and spatial searches is applied. Voxels not satisfying this 
hlter are unhagged as HISA, and their is reset to the observed H I brightness. Computing 
Ty and AT for a union voxel set and applying an intersection hlter afterward ensures that 
(1) only the most likely HISA features survive, and (2) any “penumbral” contamination 
from undetected HISA in their T^^^ voxels is minimized; otherwise, T^j and |AT| could 
be signihcantly underestimated. An alternative is to interpolate only over HISA satisfying 
the intersection hlter with all union-hlter voxels dropped from the ensemble, but this 
frequently leaves too few edge voxels for a robust T^ estimate. 

Three additional hlters are applied with the intersection hlter. Voxels with AT > 0 are 
discarded, as are those in the noisy peripheries of the survey and those for which T^ < Tcrit 
[70 K], a stricter value than the previous Tcrit [30 K] of §§2.2-2.3. The peripheral culling 
rejects HISA voxels with CGPS held mosaic weights Wm < 0.382, the lowest weight that 
occurs between synthesis held centers. Since Wm oc fhis allows a maximum noise of 

1.618 times the held center value (see Taylor et ah 2003), which is typically 5 — 7 K for the 
Ty ~ 70 — 130 K levels of our HISA features. 

The choice of Tcrit = 70 K is empirically based. As noted in Paper I and §2.1, hnely- 
structured H I emission is common in the CGPS data where the total amount of emission 
is low, i.e., oh the plane and at interarm velocities. Without some sort of Tcrit hltering, 
the HISA identihcation software is easily fooled in these regions, bagging many false HISA 
features adjacent to and between sharp-edged emission. We are confident these are false HISA 
features, since the absorbing H I would have to be unrealistically cold to absorb against such 
faint H I backgrounds, and such apparently strong features are far less abundant in brighter 
emission fields where they should be easier to detect and where more gas should be found 
generally. There is no single Tcrit value that excludes all such false HISA while retaining all 
real HISA. We chose Tcru = 70 K to balance these two needs, with greater priority placed on 
the first. For the CGPS, 70 K rejects essentially all false HISA arising from sharp emission 
edges while keeping most real HISA. The model tests of §3 show that the false HISA rejection 
is quite successful. A few cases of some real HISA being missed or truncated are discussed 
below and in Paper H. 
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2.4-3. Examples 

Figures 4-6 illustrate the HIS A amplitude extraction process with sample channel maps 
and spectra that include the same features shown in Figures 1-3. These give the initial 
H I data, the AT amplitudes computed in different stages of the analysis, and the final 
Tjj. As the figures show, most of the visually apparent HISA is readily extracted. Some 
residual HISA remains in T^, but its amplitude is a few K at most; the AT value of —40 K 
extracted for the Perseus Globule is only 2 K weaker than that obtained with the more 
conservatively-chosen Paper I spatial boxes (Table 1). 

The HISA identification software cannot detect features larger than its CLEAN scales 
[8 kms“^ and 20'] and instead flags only their darkest parts. To overcome this limitation, we 
feed the T^ cubes back into our search algorithms to identify HISA missed on the previous 
pass (§2.1). Subsequent AT extractions are made with unions of HISA flags from all prior 
search passes but always use the original H I data for T^^^. Three such passes are adequate 
for the CGPS H I data set. Of the HISA voxels extracted in all three passes, 86.5% were found 
in the first pass, 11.2% in the second, and only 2.3% in the third. For brevity. Figures 4-6 
display only third-pass results, which differ little from the first pass for this example. A case 
with more dramatic differences between passes is illustrated in Figures 7 & 8. Features this 
big require multiple passes to capture. HISA flagging is significantly improved after the first 
pass in both the spatial and spectral domains. We show only first- and third-pass results 
here, since the second pass closely resembles the third. After three passes, HISA flagging is 
incomplete in only a few places due to T^^ < 70 K truncation, mostly near the northern edge 
of the map (Fig. 7). Aside from minor losses from T^^ changes, the flagged HISA generally 
increases, with the fraction of the total flagged per pass being 72.9%, 22.5%, and 4.6% for 
passes 1, 2, and 3. The smooth T^{i,b,v) structure in Figures 7 & 8 shows that our T^^ 
estimation method follows the large-scale H I emission brightness reasonably well. 


2.4.4- ^ Note on the Assignment of Strueture 

We have chosen to attribute fine-scale structure in T^^^ to AT, leaving smooth on 
the scale of the HISA feature. This approach presumes that the absorbing gas is finely 
structured and the H I background is not, consistent with our adopted HISA identification 
strategy (§2.1). Such consistency allows the identified HISA structure to be removed so that 
subsequent search passes do not flag it again. If however some T^^^ structure arises from 
(e.g.. Knee & Brunt 2001), the true AT is smoother than we have found. We feel that our 
choice of method is reasonable for most circumstances. Small-scale H I emission structure is 
common in the general ISM but appears minimized in the bright, smooth H I fields where 



we see most CGPS HISA. 


3. Survey Reliability and Completeness 
3.1. Motivation 

The eye is the hrst means of identifying HISA features that meet the appropriate criteria, 
and the search algorithms of §2 were designed primarily to mimic visual detection. However, 
the eye can be fooled; for example, it often hnds false patterns in noise, perhaps due to 
evolutionary pressures to spot predators (Peebles 1993). We made our HISA search and 
extraction algorithms as rigorous as possible, but they remain limited by a number of factors, 
including: 

1. Tjj faintness: To avoid confusion with emission gaps at low column densities, HISA 
detection is blocked if < 70 K. Where this occurs, small features or parts of large 
ones may be missed. 

2. Tjj underestimation: We assume is not hnely structured and estimate it from 

Tqpp voxels surrounding the HISA feature in 3-D. However, many HISA features occur 
near spectral emission peaks. If then we underestimate and |AT|. 

Both can also be underestimated if HISA flagging is incomplete and an unidentihed 
“penumbra” of faint HISA contaminates 

3. Noise degradation: Despite smoothing, some low-amplitude HISA will be lost to 
noise. Whole features may be missed, or just their cores may be detected, making 
them appear smaller, clumpier, and more fragmented than they really are. In addition, 
false HISA detections will be introduced by noise fluctuations at low |AT|. 

4. Overlarge Structure: By design, the HISA search algorithms cannot flag whole 
features larger than the adopted 20' and 8kms“^ CLEAN hlter scales. The use of 
multiple search passes eases this limitation but may not remove it entirely. 

5. Unresolved Structure: Small-scale HISA structure may be diluted or missed entirely 
if undersampled. Angular structure down to the 1' CGPS beam is seen (Paper I), 
so smaller-scale structure seems likely. HISA linewidths narrower than the CGPS 
Nyquist limit of 1.65kms“^ also exist (Knapp 1974; Li & Goldsmith 2003). Such 
linewidths are rare in random HICA sight lines (e.g., Colgan, Salpeter, & Terzian 
1988), but since continuum backgrounds can be brighter than H I backgrounds, HICA 
can include warmer absorbing gas than HISA, and it’s possible that HISA lines may 
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be narrower on average. HISA velocity dilution is thus a real concern in synthesis 
surveys, while angular dilution will be more severe for single-dish telescopes; no present 
instrumentation can adequately sample both the angular and velocity structure of 
HISA. 

To evaluate such limitations objectively and quantitatively, we tested our software’s 
ability to extract HISA features from model H I data. Our goal was to understand (1) what 
fraction of HISA the software detects, (2) how many detections are false positives, and (3) 
how much the detected features differ in size and amplitude from their input versions. 


3.2. Models 

The model 21cm spectral line cubes were sums of noisy, positive-amplitude emission 
backgrounds and noise-free, negative-amplitude absorption features. Gas properties and ra¬ 
diative transfer effects were not considered, as these are irrelevant to the detection software’s 
performance. To test this under varying conditions, 64 randomly-conhgured model cubes 
were made. Each cube used the standard CGPS pixel and channel sizes, with dimensions half 
those of a standard GGPS mosaic cube for computational efficiency: 2.56° x 2.56° x 106 km s“^ 
(512 X 512 X 128 voxels). Sample model data are shown in Figure 9. 


3.2.1. Absorption Features 

Each model HISA feature was given a cylindrical shape in the H I line cube, with a 
Gaussian velocity prohle and a flat-disk spatial prohle convolved with a 60" circular beam. 
Although simple, these angular and velocity prohles are similar enough to typical HISA for 
testing purposes. The features, known as “hockey pucks” for their usually oblate aspects in 
the GGPS voxel grid, are parameterized by their unconvolved angular FWHM A6p, velocity 
FWHM Avp, and (negative) central amplitude ATp. 

2048 hockey pucks were inserted into each model cube with random sizes, amplitudes, 
and positions. The {i, b, v) and ATp distributions were uniformly random, except that puck 
overlaps in [i, b, v) were prevented, with a minimum separation of 1 voxel enforced between 
pucks at an absorption threshhold of 0.005 K. The A6p and Avp distributions were skewed 
toward small features, with relative probabilities of P{A9p) oc A9p~‘^ and P{Avp) oc Avp~^. 
This was done to counter the fact that larger pucks have more voxels. We measured angular 
and velocity widths locally from each voxel in the performance analysis (§3.3), and P{A9p) 
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and P{Avp) made our voxel-based size distributions more evenly sampled. The puck pa¬ 
rameter ranges used were 0.1' < A6p < 60', 0.355 kms“^ < Avp < 16.0 kms“^, and —1 K 
> ATp > —40 K. A9p = 0.1' results in “unresolved” structures that get diluted in the CGPS 
beam Similarly, A6p = 0.355 kms“^, which would occur for purely thermal H I linewidths 
at 2.73 K, would be unresolved by the CGPS in velocity. Both cases test the detection limits 
for hue-scale HISA structure. At the other extreme, the software’s sensitivity to structures 
larger than the 20' and 8kms“^ CLEAN hlter scales is also tested. 


3.2.2. Emission Background 

The background emission helds were similarly constructed of random ensembles of 
cylindrically-symmetric components. These differed from the hockey puck absorption fea¬ 
tures in that they had positive amplitudes, simple Gaussian angular prohles, and minimum 
sizes equal to the CLEAN scales. They were also allowed to overlap and hll the entire cube, 
so we refer to them as emission components rather than discrete features. Size ranges were 
20' < A6ec < 120' and 8kms“^ < Avec < 20kms“^, with P{A6ec) and P{Avec) the same 
as for the HISA pucks. The amplitude range was -|-1 K < ATgc < -|-20 [I4c/Pec,max] K, 
where the component volume I4c = AOec^Avgc, Vec,max = (120')^ 20 kms“^, and the ATec 
distribution was further skewed as P{ATec) oc ATec~^. These adjustments placed most of 
the power at large scales, as is seen in real H I emission (Green 1993). 4096 components 
were summed to make each model cube’s emission field. This was subsequently rescaled to 
give a median brightness temperature of 70 K, so that half the cube on average would allow 
HISA detections, and ~ 70 K effects could be easily studied. 

For greater realism, noise was added to the H I model. A 3-D held of uncorrelated Gaus¬ 
sian random voxel noise was convolved with a 60" FWHM Gaussian beam and a 1.319 kms“^ 
FWHM Gaussian velocity point spread function (PSF) to mimic the structure of correlated 
noise in the CGPS data, and the rms noise amplitude was scaled to match the 6 K level found 
in CGPS held centers hlled with 100 K emission. Unlike the CGPS noise, the model noise 
does not vary with distance from held centers, its beam is declination-independent, and its 
velocity PSF is not the true CGPS velocity PSF, which is the Fourier transform of a Gaus¬ 
sian truncated at 20% of peak amplitude, with an ehective FWHM of 1.319 kms“^ = 1.6 
channels. However, none of these diherences should seriously ahect the performance analysis. 
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3.3. Analysis 

The same procedures used to search for HISA in the CGPS data (§2) were applied to 
the model data. The software performance was then evaluated by comparing the input and 
extracted absorption. Each of the 64 model cubes was analyzed separately, and the results 
were merged afterward to maximize coverage of the model parameter space. 


3.3.1. Measurement of HISA Observables 

Four observables were extracted from the HISA data: the absorption amplitude AT, 
unabsorbed brightness angular width A6, and velocity width Av of the absorption. 
All four were measured at each voxel {£, b, v) position rather than on a per-feature basis, 
because the CGPS HISA has complex structure, and properties can vary within one feature. 
However, Ad and Av are still aggregate properties that depend on the local distribution of 
HISA around them. 

The velocity width Av measures the line FWHM. For each HISA voxel, all HISA con¬ 
tiguous in V at the same {£, b) position is examined to hnd the channel with maximum 
|AT|. On either side of this channel, the closest channels for which |AT| < 0.5 \AT\max are 
identihed; non-HISA channels with AT = 0 are included if necessary. The half-maximum 
velocities are rehned to sub-channel accuracy by linear interpolation. The difference between 
them is Av. This Av is assigned to all HISA voxels in the same velocity grouping at the 
same {£,b). We make no attempt to correct for instrumental broadening (e.g., Montgomery 
et ah 1995), since this is nontrivial for the complex line structure of some HISA, and only the 
narrowest features will be broadened signihcantly in the GGPS. Figure 10 shows a sample 
map of Av. On average, the broader linewidths occur in larger HISA features. 

The angular width A6 measures the diameter of the largest circle containing the {£, b) 
position and zero non-HISA voxels at the same velocity. This scheme measures the edge- 
to-edge feature width on a local scale. Unlike the FWHM-based Av, Ad uses the full 
HISA extent. Experiments with an angular FWHM proved too sensitive to complex internal 
structure in the AT{£, h) distribution to be interpreted easily. The resulting Ad measures can 
be a bit larger than the FWHM-based hockey puck Adp, especially for large puck amplitudes 
ATp, but since the same Ad measure is taken of the HISA model inputs and outputs, the 
method is internally consistent. 

Figure 10 illustrates how Ad is measured. From each HISA voxel, the angular offset 
dos to the nearest non-HISA voxel with the same velocity is found. dos{£,b) maps HISA 
“skeletons” whose ridge-like maxima equal half the local width of the feature. To build a Ad 
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map, we step over all (£, 6) positions and write 2 9os{(^,b) to all points in a new map 

for which < 6*ofr(^, &), where the largest imposed 2 9os valne is always 

retained. This yields the angnlar width of HISA hlamentary strnctnre at that velocity. 


3.3.2. Performance Measures 

The HISA extraction software’s performance was measured in three ways: the “through¬ 
put” fraction fdet of model HISA detected; the “true fraction” ftme of HISA detections corre¬ 
sponding with model input; and the “drifts” AAT, AT^^, AA9, and AAn between input and 
output properties. All were measured as functions of the HISA observables (AT, T^, A9, An), 
which dehne a 4-D parameter space in which the software performance is evaluated. 

The (£, b, v) positions and (AT, T^, A9, An) properties were hrst tabulated for all HISA 
voxels in both the input and output model (£, b, n) cubes. The 4-D voxel count histograms 
Nin,aii{ATin, T^^ A9in, Avin) and Nout,aii{^Tout, A9out, ^Vout) were constructed from 
these voxel tables, using bin dimensions of 2.5 K x 2.5 K x 0.5' x 0.5kms“^. In parallel, the 
voxel count histogram Nin^et was made from all input voxels with output at the same (£, b, n), 
and Nout,true was made from all output voxels with input at the same (£, b, n). Throughputs 
and true fractions were then derived as fdet = Nin^det/Nin,aii and ftme = Nout,true/Nout,aii- 
From the subset of voxels appearing in both the input and output HISA cubes, four 4-D drift 
histograms of the average changes undergone by AT in, T^j^, A6*j„, and Avin as functions of 
{ATin,T^^^, A9in, Avin) Were assembled, e.g., as AAT = {ATout - ATin), with the average 
taken over all HISA voxels in {i,b,v) with the same (ATj„, T^^^, A6'j„, Auj^) properties. 

The 4-D performance histograms were computed for all 64 HISA models, which were 
identical apart from different random number inputs. The results were merged together into 
a single set of histograms and smoothed with 4-D Gaussians to improve the performance 
measure reliability and coverage of the parameter space. Variable smoothing scales were 
used, because the parameter space coverage was sparser in some areas than in others. The 
smoothing FWHM were 0.5|AT|, 0.5|Tj^ — 70 K|, O.5A0, and 0.5An, with minimum values 
of 6.0 K, 6.0 K, 1.0', and 1.319 kms“^ to match the model Trms and CGPS resolution. This 
scheme preserved structure in the well-sampled parts of the parameter space and interpolated 
it smoothly elsewhere. 

Nout,alii, at out, Tjjouv ^^out, Avout) histograms were also computed for all 36 mosaic cubes 
of real GGPS HISA and summed together to assess the distribution of observed HISA in 
the survey. As with the HISA feature catalog in Paper H, sight lines with T^ > 20 K were 
excluded. 
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3.4. Results 

3 . 4 . 1 . Model Parameter Distributions 

To examine general {AT,T^, Ad, Av) parameter distributions and trends, we made 2- 
D projections of the unsmoothed model Nin^aii and Nout,aii and the real CGPS Nout,aii by 
summing the counts along the 2 other axes in the 4-D parameter space. A number of these 
2-D projections are shown in Figure 11. 

The input models £11 ranges of 0 > AT > —40 K, 30 < < 150 K, 0 < A6* < 61', and 

0.8 < Av < 16kms“^, with peaks at = 70 K and small |AT| and A6-, the Av distribution 
was relatively flat. The peaked AT and A6 distributions occurred despite the shapes chosen 
for the puck property distributions (§3.2). The low-|AT| peak is due to faint HISA in feature 
line wings and spatial envelopes. The I 0 W-A 6 ' peak results from A6 being measured from 
HISA voxels above a minimum |AT|, which makes pucks appear smaller off the line center. 
In a similar way, pucks with the same A6p have greater A6 if |ATp| is larger, and no voxels 
with A6 <1.5' and AT < —20 K are found. 

The extracted model AT peak is shifted to ~ —10 K. A tail of strong absorption 
extends to AT ~ —60 K. Although they account for only 2% of the total HISA voxels, 
these AT < —40 K points demonstrate that some AT drift occurs. The extracted T^ is 
truncated at < 70 K but otherwise appears unchanged from the input model. Large A9 and 
Av values are both truncated as predicted in §3.1, although less severely for A6, since the 
non-Gaussian angular profiles better survive the GLEAN process. Iterative extraction (§2) 
allows much of the puck structure > 20' to be recovered here, but the purely Gaussian puck 
velocity profiles with Avp > 8kms“^ are GLEANed out of the data with great efficiency. 
A6 peaks at the same location as the input data but is more concentrated. Av is also now 
concentrated toward low values. 


3 . 4 . 2 . Real Parameter Distributions 

The GGPS HISA AT has a larger range than in the extracted models, due to a few very 
strong features like GHISA 079.88+0.62+02 and GHISA 091.90+3.27—03 (see Paper H for 
feature details). Its AT peak is similar to the models’. The GGPS HISA is truncated for 
Ty < 70 K as well as T^j > 135 K, where the maximum H I brightness is reached. A6 peaks 
at the same scale as the model output but has a lower maximum scale, perhaps because real 
HISA is more porous. The model output Av range is slightly exceeded. The Av peak and 
maximum value are both a little broader than for the extracted model HISA. 
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The CGPS HISA fills almost the same parameter space as the extracted model HISA. 
Since some model parameter ranges are larger on input than output, the ranges of real HISA 
properties may exceed those observed in the CGPS. HISA with < 70 K is already known 
(e.g., Knee & Brunt 2001), and HISA with Av > 8kms“^, A6 > 33', or |AT| > 80 K is 
possible, although |AT| < + T^ — is required by the 4-component radiative transfer 

equation (Paper H, Eqn. 1), where is the spin (excitation) temperature of the absorbing 
gas. Also, A6 is limited by feature porosity. GHISA 091.90-1-3.27—03 exceeds the A6 limit 
in gross extent but is not completely solid. Larger features are known (e.g., the Riegel & 
Crutcher 1972 “cold cloud” toward the Galactic center), but their porosity at 1' resolution 
has not been reported. 

The input models had no built-in correlations of feature properties, and the same is 
largely true for the real HISA. Certainly AT and are not related in the CGPS, except 
that the strongest AT’s prefer some values over others. However, the peak CGPS Ad and 
Av both increase gradually with |AT| out to AT = —40 K, A6 = 20', and Av = 4kms“h 
These trends have considerable scatter, and there are weaker versions in the extracted model 
HISA. But if they reflect real HISA behavior, then stronger absorption is more likely to 
have larger contiguous angular structure or broader linewidths, although A6 and Av do not 
correlate as well with each other as they do with AT. 


3.4-3. Throughput and True Fraction 

Figure 12 presents selected 2-D slices through the 4-D parameter space to illustrate 
the behavior of the throughput fdet and true fraction ftme- These have been smoothed 
as described in §3.3.2. We find that most HISA is detected if it is significantly stronger 
than the noise, larger than a few beams, narrower than a few kms“^, and has T^ > 80 K. 
Furthermore, the vast majority of detected HISA is reliable, except for HISA that can be 
mimicked by beam-scale noise fluctuations. 

The throughput is high for much of the parameter space: fdet ^ 0.80 where AT < 
—20 K, T^ > 80 K, A6 > 5', and Av < 3.5kms“^, reaching a maximum of ~ 0.99 where 
all of these criteria are well-met. Where one or more of them is not met, fdet drops rapidly, 
with fdet —^ 0 for AT > —2 K, T„ < 60 K, A6 < 1', or Av > 8kms“^. Most of this behavior 
can be explained as losing features in the noise, underbright T^^, or overbroad linewidths 
poorly fitted by the spectral search method’s Wbroad [4kms“^] Gaussians (§2.2). However, 
low fdet seems to occur for low A6 even when |AT| is large. This suggests some beam-scale 
HISA features may be missed by our search if they are isolated from larger structures. The 
Perseus HISA Globule of Paper I is detected easily, but it is also attached to the complex 
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GHISA 139.01+0.96-40. 

By contrast, narrow-line HIS A is detected with great efficiency: high is found for 
Av as low as the 0.8knis“^ CGPS channel width, so long as AT < —10 K. To allay the 
concerns of Li & Goldsmith (2003), lines narrower than this should also be detectable if their 
intrinsic amplitudes are larger to compensate for spectral dilution. A thermally broadened 
Tg = 10 K HISA line would be diluted by a factor of 2.4 but easily detected if AT < —24 K. 
HIS A of this strength or greater is common in the GGPS. 

The true fraction has much simpler behavior: ftme 1-0 almost everywhere that 
AT < —20 K, A6 > 3', or Av > 3kms“^. If none of these holds, noise fluctuations in the 
data produce signihcant false positive detections, with ftrue ^ 0.2 in the worst cases. 


3.4-4- Parameter Drift 

Figure 13 illustrates trends in the parameter drifts AAT, AA6, and AAv. Since T^^ = 
Tqj^ — AT (§2.4) and T^^^ is hxed, AT^ = —AAT. With minor exceptions, the behavior of 
all the parameter drifts is fairly simple: |AT| and T^^ are often underestimated by a few K 
in well-detected features, while incomplete detections of large features (e.g.. Fig. 9) cause 
A9 and Av to be underestimated as well. 

The drift in AT is positive, i.e., toward reduced amplitudes, if these three conditions 
are met: |AT| > the 6 K noise level, T^ > 80 K, and A6 > 2'. If one of them is not met, 
AAT < 0. There is no strong dependence on Av. The amount of drift is typically a few K, 
with a range of ±10 K in most areas but more negative for T^ < 65 K. The AAT behavior 
has a similar shape to fdet above, suggesting that detection sensitivity governs AT drift. 
Features with intrinsically low |AT|, low T^, or very small A6 appear to have larger |AT| 
(and T^) if they are detected. But fdet shows most are not detected; those that are represent 
a biased sample in which | AT| and T^^ happened to be boosted in the right direction to make 
them detectable. By contrast, easily detectable features appear to have lower | AT| than they 
should. Since AT^ = —AAT, their T^ is underestimated as well; one of the mechanisms 
noted in §3.1 may be to blame. But whether positive or negative, the magnitude of AAT is 
usually only a few K. The large drifts that produced the AT outliers in the model Nout,aU 
results (§3.4.1; Fig. 11) are exceptional cases. 

The drift in Ad is negative everywhere that Ad > 1'. It covers a range of —50' < 
AAd < ±1', becoming steadily more negative for larger Ad, with only minor dependencies 
on other parameters. The small positive drifts occur when Ad < 1' HISA is augmented by 
beam-scale noise fluctuations; Ad < 1' can occur in the line wings of 1' features, since pucks 
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appear smaller off the line center (§3.4.1). The much larger negative drifts are from those 
large features that are incompletely detected, as conhrmed by visual inspection of the (£, b, v) 
data (e.g., Fig. 9). Some of these partial detections are caused by ~ 70 K boundaries, 
but most are from noise. 2- and 3-a noise fluctuations frequently poke holes in fairly strong 
features, reducing their A6 measures. This is especially common in the line wings where 
I AT I is less. 

The drift in Av is positive only where both Ad < 2' and Av < 1 kms“^ and is negative 
everywhere else. It covers a range of —13kms“^ < Av < +0.2kms“^, becoming steadily 
more negative for larger Av, with only minor dependencies on other parameters. As with 
AA6, noise degradation is a major cause of this AAv trend. But in addition, the spectral 
HISA search itself is optimized for the detection of HISA linewidths < 4 km s“^, and Gaussian 
features broader than 8kms“^ are CLEANed out almost entirely (§3.4.3). Lastly, features 
near emission peaks may have underestimated (§3.1), leading to |AT| underestimates in 
line wings, and thus Av underestimation. Since our estimation is not a simple linear 
interpolation (§2.4), this effect may not be as severe as that noted by Levinson & Brown 
(1980), but the A AT > 0 results above for well-detected features suggest it is not zero either. 

Levinson & Brown (1980) also note that gradients will cause Av to be narrower than 
the HISA optical depth prohle FWHM, and the line center in AT will appear shifted from 
Tmax, the maximum optical depth. However, the performance of our HISA software is only 
concerned with AT, so these biases do not apply here. And while our voxel-based evaluation 
method is not able to track changes of position, visual inspection shows that hlaments and 
other structures within features do not shift between input and output; only the centroids 
of whole features may shift if the features are not completely detected. 


3.5. Reliability and Completeness 

These results have many uses. In addition to statistically describing how well HISA 
is detected, they can be applied directly to particular features to assess their reliability 
and completeness. The former is done by measuring (AT, T^, A0, An) at each HISA voxel 
{i,b,v) position and interpolating ftme from the 4-D histogram described in §3.4.3. ftme is 
the likelihood that a HISA detection represents real absorption. We have determined ftme for 
each CGPS HISA voxel. Figure 14 shows ftme contours on a sample HISA feature. The HISA 
detection reliability in this case is quite high. In Paper H, we use ftrue{^,b,v) in analyses of 
total GGPS HISA coverage and the distributions of weak and strong absorption. We also 
assess the completeness of our HISA detections. The actual detection fraction fdet is not 
recoverable from observed data, but we consider the fraction of detections with {T^) > 80 K, 



since such HISA has fdet > 0.8 if its size and strength are appreciable (§3.4.3). 


4. Conclusions 

We have described algorithms that identify and extract H I self-absorption (HISA) 
features in high-resolution H I 21cm line data cubes. These algorithms were designed to 
carry out a HISA survey of cold H I in the initial 73° x 9° phase of the arcminute-resolution 
Canadian Galactic Plane Survey (CGPS), but they should have more general applicability. 

Our search algorithms use CLEAN-based spatial and spectral hltering to remove large- 
scale emission structure and identify HISA as signihcant negative residuals. Features iden- 
tihed in both spectral and spatial domains are flagged as HISA, and the unabsorbed bright¬ 
ness along the feature sightline is estimated from a 3-D interpolation of the OFF-feature 
brightness temperature HISA detections in overly noisy regions are rejected, as are 

those for which < 70 K, lest signihcant false detections result from gaps between sharply- 
structured emission features with faint backgrounds. In order to capture features larger than 
the CLEAN hlter scale, identihed HISA is removed and the search process is repeated; a 
total of three such passes suffices for the CGPS data. 

We performed detailed tests of our HISA-hnding software with model data to determine 
its detection limits, false positive rates, and measurement biases as functions of feature size, 
amplitude, and background held brightness. The tests show that HISA is well detected within 
the software design criteria, with high detection rates for HISA signihcantly stronger than 
the noise level, larger than a few beams, narrower than a few kms“^, and with > 80 K. 
At the same time, the bulk of HISA detections are reliable, with very low false positive rates 
in most parts of the parameter space except those occupied by beam-scale noise huctuations. 
Measurement drifts are small in well-detected features, with underestimated by a few K 
due to contamination of by faint, undetected HISA near the feature. Where detections 
are truncated by noise huctuations or faint T^, the bias may be somewhat larger. Incomplete 
detections also make features appear smaller in angular size and linewidth than in reality 
due to truncation. 

This paper is the third in an ongoing series investigating HISA at high resolution in the 
Galactic plane. A companion paper (Paper H) presents HISA survey results for the CGPS. 
Subsequent papers will further analyze the CGPS HISA and also examine HISA in CGPS 
extensions and in the VLA Galactic Plane Survey. 
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A. Corrections to Paper I Results 

The H I data in the vicinity of {i = 140°, b = -|-1°) have been revised from those used 
in Paper I. A single synthesis held was assigned the wrong hux scale in the H I data used 
in that paper, and this error was not discovered until after publication. As a result, the 
HISA amplitudes presented in Paper I for the Perseus HISA Gomplex and Globule features 
were in error, and the correct HISA amplitudes are smaller than those found in Paper I. 
With revised data, these features have warmer spin temperatures and lower optical depths 
than those derived in Paper I, but the column densities and masses are only mildly affected. 
Table 1 lists the corrected results for both features. The correct Globule spectrum is plotted 
in Figure 1, and the positions of both features are marked in Figure 5. The Local HISA 
Filament presented in Paper I was unaffected by this problem. 
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Table 1. Corrected Perseus HIS A Complex and Globule Properties 


Perseus Perseus 

Complex Globule 


H I Data: 

Paper I 

Revised 

Paper I 

Revised 

Input Parameters! 

|K] 

69 

71 

47 

62 

T„ 1K| 

107 

99 

112 

104 

AT [K| 

-38 

-28 

-65 

-42 

Derived Gas Properties {p = fn = 

1) 



Ts 1K| 

45 - 61 

49-65 

32-35 

41-43 

T 

0.83-1.37 

0.71 - 1.22 

1.43 - 1.54 

0.94-0.99 

^HisA [10^° cm-2] 

3.2-7.3 

3.0 -6.8 

2.2 - 2.6 

1.8 - 2.0 

^HisA 

89-65 

81-62 

124-115 

99 - 94 

^HISA [M©] 

31 - 111 

32 - 106 

0.60-1.09 

0.53-0.80 

Derived Gas Propert 

ies {fn = 0.01, Maximum 

fotal Mass) 

Ts [K] 

2.7 

2.7 

2.7 

2.7 

T 

7.0 

6.9 

2.5 

2.4 

[10^° cm-2] 

1.7 

1.6 

0.33 

0.32 

^hisa [cm-3] 

15 

15 

15 

15 

^HISA [M©] 

26 

25 

0.14 

0.12 

Ntot [10^° cm"2] 

170 

160 

33 

32 

ritot [cm“3] 

1500 

1500 

1500 

1500 

© 

0 

5200 

5000 

28 

25 


*This table follows the format of Table 1 in Paper I. 

iQnly input parameters that have changed from Paper I are shown. 
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50 0 -50 -100 -150 

LSR VELOCITY (km/s) 

Fig. 1.— The full-resolution observed spectrum 0{k) at the (£ = 139.635°, b = 1.185°) 
position of the Perseus HISA Globule of Paper I. The feature’s absorption amplitude has 
changed from Paper I due to correction of a data processing error that had no serious impact 
on the derived results (see Appendix A). 
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LSR VELOCITY (km/a) 

Fig. 2.— Velocity profiles showing HISA spectral detection stages for the Perseus HISA 
Globule position. The derived unabsorbed spectrum U{k) and the spatially smoothed ob¬ 
served spectrum S{k) are shown in the top portion of the hgure. Below them is the residual 
or difference spectrum R{k) (zero level at —10 K). The dashed line gives the level below which 
HISA is suspected. Eight channel segments are indicated where this is the case. Gaussian 
htting was accepted only for segment 5 (see text), and the resulting narrow and broad HISA 
spectra are shown below this (zero levels at —60 K and —90 K, respectively). Finally, the 
“detected” HISA spectrum is shown at bottom (zero level at —120 K). 






GALACTIC LONGrrUDE OFFSET {deg) 

Fig. 3.— Latitude profiles showing HIS A spatial detection stages for the Perseus HIS A 
Globule position (at which the Galactic latitude offset = 0°). From top to bottom, cuts 
are taken through the observed channel map the smoothed observed map S{i,j) 

and derived unabsorbed map U{i,j) (zero levels at —50 K), the smooth difference map 
S{i,j) — U{i,j) (solid) and noise truncation level —2asm (dashed) (zero level at —20 K), the 
unsmooth difference map 0{i,j) — U{i,j) (solid) and noise truncation level —2aobs (dashed) 
(zero level at —70 K), and the suspected HISA features map prior to hnal amplitude culling 
(zero level at —170 K). Fast Fourier transforms (FFTs) used in the GLEANing process make 
border areas of the map unusable after S{i,j) and U{i,j) are determined. 





Fig. 4.— {^,b) channel maps of sample Perseus HISA at —41 kms“^, showing the same 
area as Figure 1 of Paper 1. The panels give CGPS H I, HISA AT from the spectral search 
(§2.2), spatial search (§2.3), and their intersection, and the dual AT and T^ from the full 
3-D extraction (§2.4). Intensity ranges are 4-40 to 4-130 K for the hrst and last panels and 
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Fig. 5.— Detailed views of each panel in Figure 4, showing the same area as Figure 2 
of Paper I. The Perseus HISA Complex and Globule positions of that paper are marked. 
Sample velocity spectra of the Globule are given in Figure 6. 
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Fig. 6.— Single-pixel velocity spectra at the Perseus HIS A Globule position (£ = 
139.635°, b = 1.185°), showing CGPS H I, HISA |AT| from the spectral (§2.2) and spa¬ 
tial searches (§2.3), and the hnal AT and T^j from the full 3-D extraction (§2.4); the hnal 
AT zero point has been shifted to 150 K for clarity. Although T^j appears below the observed 
Tqpp at the HISA velocity edges, this spectrum shows only a small subset of the T^pp voxels 
that surround the feature in 3-D. is estimated from the entire 3-D T^pp set, a larger 
sample of which is shown in the corresponding channel map in Figure 5. 








- 32 - 



Fig. 7.— {i, b) channel maps illustrating multiple-pass extraction of a large HIS A complex 
in the CGPS MK2 mosaic cube at at —3 kms“^. Shown are H I brightness, first-pass AT, 
third-pass AT, and corresponding T^. Intensity ranges are 0 to -1-130 K for the first and last 
panels and —65 to 0 K for the two AT maps. The small 6' x & box {i = 91.20°, b = +2.97°) 
marks the area from which the spectra in Figure 8 were extracted. The feature extraction 
is truncated for b > +4.5° due to T^^ < 70 K (see §2.4). 
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LSR VELOCITY (km/s) 

Fig. 8.— Spatially-averaged velocity spectra illustrating multiple-pass HIS A extraction. The 
spectra are extracted from the 6' x 6' box marked in Figure 7 at (£ = 91.20°, 6 = -1-2.97°). 
Shown are H I emission, first-pass HISA AT, third-pass HISA AT, and third-pass HISA T^^. 
For clarity, the AT zero-points have been shifted to 160 K. 
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Fig. 9.— (£, 6) and {i,v) slices of sample model H I data, showing the input HISA “hockey 
puck” amplitude ATin, the noisy background emission held with pucks added, and the 
extracted HISA amplitude ATout after the 3rd identihcation pass. Intensity ranges are —40 
to 0 K for the AT maps and 0 to 120 K for the H I maps, from black to white. 
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Fig. 10.— Channel maps illustrating velocity width and angular width measures: (a) HISA 
absolute amplitude |AT|; (b) Av, the line full width at half maximum; (c) 2x 9os, where 9os 
is the offset to the nearest HISA feature edge; and (d) A9, the angular width obtained from 
2 9os values imposed out to a radius 9og. Intensity ranges are linear, from white to black, 
for 0 K < |AT| < 40 K, 0kms“^ < Av < 6kms“^, 0' < 29os < 10', and 0' < A9 < 10'. See 
53.3.2 for further details. 
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Fig. 11.— 2-D projections of 4-D property histograms of input voxels (model Nin^aii), ex¬ 
tracted voxels (model Nout,aii), and observed CGPS HISA (real Nout,aii)- Axis labels are “dT” 
= AT, “Tu” = T^, “da” = A9, and “dv” = An. Counts were summed along the orthogonal 
axes, so the full distributions are visible. No signihcant trends in {T^,A6) or {T^,Av) were 
found. The intensity scale is logarithmic from 1 count (light) to 1 million counts (dark). 
Contours mark counts of 10^, 10^, 10^, 10®, and 10^ in all panels except the {AT,A6) and 
(An, A9) maps of the model Nm^aih where 10^ and 10"^ are omitted. 
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Fig. 12.— 2-D slices through the 4-D throughput fdet and true fraction ftme histograms. 
The fdet slices intersect at a common position marked with a cross. The ftme slices intersect 
at a different common position, also marked. The intensity scale is linear, from 0.0 (white) 
to 1.0 (black). Black contours mark values of 0.1, 0.3, and 0.5; white contours mark values 
of 0.7 and 0.9. Axis labels are as in Figure 11. 
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Fig. 13.— 2-D slices through the 4-D drift histograms AAT, AAd, and AAv. As in 
Figure 12, crosses mark slice intersections for each 4-D drift measure. The intensity scale is 
linear, from negative (black) to positive (white). The intensity ranges are —10 K < AAT < 
-|-10 K, —50' < AA^ < 4-1', and —13kms“^ < AAn < -|-lkms“^. Where present, a thick 
black contour marks zero drift. Thinner contours mark positive (black) and negative (white) 
drifts at intervals of 5 K, 10', and 2kms“^, respectively. Axis labels are as in Figure 11. 
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Fig. 14.— Sample extracted HISA AT{i,b) map for a single velocity, showing contours of 
ftrue = 0.682690, 0.954500, 0.997300, and 0.999937, which correspond to reliability thresh- 
holds of 1, 2, 3, and 4 a if Gaussian statistics apply. The maximum ftrue in this map is 
0.999999, which is equivalent to 4.90 a. The region shown is the same as in Figure 10. 




